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ABSTRACT 



A method and system for executing branch or other instruc- 
tions in a loop. A loop end condition is evaluated in a fixed 
point unit while floating point instructions are evaluated in 
a floating point unit In a first execution of the instructions 
in the loop, the loop end condition is processed as in prior 
art A branch target instruction is stored in a brandi target 
register and an instruction address of the branch target 
instruction is stored in a branch address register. However, 
on subsequent execution of the instructions in the loop, the 
branch condition is evaluated and, if it is fulfilled, once the 
end of the loop is detected by comparison of the effective 
address of the next instruction to be executed with the 
contents of the branch address register, the effective address 
of the first instruction in the loop is passed from the branch 
target register to an operations register. 

15 Claims, 6 Drawing Sheets 
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METHOD FOR EXECUTING BRANCH the results of each multiple/divide on a bypass bus to the 

INSTRUCTIONS BY PROCESSING LOOP input of an adder along with the inputs from an accumulate 

END CONDITIONS IN A SECOND bypass bus which is the output from the adder for an 

PROCESSOR automatic add operation on an accumulate rmiltiply or 

5 accumulate divide operation, This allows two floating point 

The application is a continuation of application Ser. No. results to be produced in each cycle, one of which can be 

08/234J80, filed Apr. 28, 1994, now abandoned. accumulated without any intervening control by the CPU. 

T> A r.x^^T,rxwTVTi^ r^T, rrTTT, TXTt TcxFTTrxxT ^.S. Pat No. 4,654,785 to Nishiyama et aL discloses an 

BACKGROUND OF THE INVENTION information processing system having a pluraUty of arith- 

The invention relates generally to computer systems and metic units such as a general instruction arithmetic unit or 

deals more particularly with an improved technique for CPU and a floating point instruction arithmetic unit or FPU. 

executing branch instructions. The information processing system includes means provided 

In a computer program, branch instructions are encoun- for each of the arithmetic units which generates a condition 

tered frequenUy and are executed in various ways. U.S. Pat code for use in branch judgement of a conditional branch 

No. 5,070,475 to Normoyle et al discloses a data processing instruction. Within each arithmetic unit, branch judgement 

system which includes a floating point conq)utation unit means are provided which judge the success or failure of a 

(FPU) which interfaces with a central processing unit branch of the conditional branch instruction by using the 

(CPU). The CPU supplies a dispatch control signal to inform condition generated by the code generating means. A judge- 

the FPU that it is about to execute a floating point micro- ment unit decision circuit is also provided which is respon- 

instruction and supplies a dispatch address which includes sive to the operation state of each arithmetic unit for 

the starting address of the floating point microinstructions generating an instruction signal indicating which of the 

during the same operating cycle that the dispatch control branch Judging means is to be operated to and supply the 

signal is supplied. A buffer memory is provided in the FPU instruction signal to the branch judgement means, whereby 

to store the starting address of one decoded macroinstruction branch control is carried out by using as a valid result either 

while a sequence of microinstructions for a previously ^ one of the branch judgement results obtained in the respec- 

decoded macroinstruction is being executed by the FPU. ^^ve arithmetic units. 

U,S. Pat No. 5,070,475 also discloses interface logic An article entitled *'Repeatiiig Microcode Words for Fast 

which handles suitable control signals for permitting asyn- Controlled Repeat Cycle Functions", IBM Technical Dis- 

chronous operation of the FPU and the CPU and which closure BuUetin, vol 32, no 5B, October 1989, pp 403^04, 

utilizes a single level of pipelining macroinstmctions for teaches a repeat cyde enabling function in a microprogram 

initiating FPU operations. Suitable control signals are used controUed processor. In the disclosure, a miaowOTd control^ 

in order to permit the transfer of FPU instruction informa- latch is set as each of the looping microcontrol words is^ 

tion and to arrange for the proper loading and subsequent use being executed This latch controls the gating of the micro- ' 

thCTeof by the FPU. Further control is required to assure tiiat control words into the control register. If the latdi is ON, the 

the CPU does not transfer an FPU instruction when the control word clocked into the control register at the begin- 

single buffer pipeUne at the FPU is fuU and unable to accept ning of tiie next cycle will be from the output of the current 

the FPU instruction. control register, ff the latch is OFF, the control word clocked 

U.S. Pat No. 5,070,475 also discloses control signals f "^^^^ '^^f^ begimung of the next cycle 

which provide for the transfer of data in either direction ^ ^e the output of the control storage, 

between the CPU data bus and the FPU data bus. Moreover, An article entitied "Zero-cycle Branches in simplQ RISC 

other control signals are provided for handling floating point Designs", IBM Technical Disclosure Bulletin, vol 33, no 

faults which may occur during the calculations being lOB, March 1991, pp 253-259, teaches a method of reduc- 

executed by the FPU. "ig pvpoimc delay in a RISC system by providing a 

U.S. Pat No. 4,509,116 to Lackey et al. discloses an 45 branch execution unit which executes the branches without 

interconnection arrangement between a CPU and an FPU interrupting with or using standard fixed point mstruction 

(caUeda"specialinstructionprocessor"). The CPU retrieves resources. The branch execution unit attempts to make 

aU of the microinstructions from the memory in series and branches all but invisible to the fixed-pomt and floating- 

decodes tiie instruction. An image of the instruction is point execution umts. Software support is needed in order to 

passed to the FPU. When an instruction is received which 50 ^P^^°° ^^^^^ execution unit 

requires processing by tiie FPU, then the CPU retrieves the In high end machines, a number of methods are known in 

data words comprising the operand from the memory and which the number of cycles required to carry out a branch 

passes them to the FPU. After receiving the instruction, the instruction are reduced to dtha zero cycles or one cycle. In 

FPU also decodes the instruction and proceeds to receive the general, tiiese allow the next branch cycle to be processed 

data words conq>rising the operand of the instruction. The 55 while previous instructions are being executed. Assuming 

FPU then processes the operand in a conventional manner that the previous instructions neither affect the fulfillment of 

and prepares to transmit back to the CPU the results of the the branch condition nor the generation of the address to 

processing, i.e. the processed data and any condition codes. which the microprogram Jumps, then the branch condition 

When the CPU is signalled by the FPU that it has finished will be calculated and the address of the next instruction to 

processing, it signals the FPU to transmit the data. The CPU 60 be processed placed into the instruction buffer. Such imple- 

is then able to transmit the processed data back into storage mentations require a higher amount of computer power and 

in the memory. extra drcuitry to control the parallel data flows. In addition 

U.S. Pat. No. 4,683,547 to DeCSroot teaches a data pro- it niay not be possible to provide downward compatibiHty 

cessing system which includes a multiple floating point with existing microcode sequences, 

arithmetic unit with a putaway and a bypass bus. The FPU 65 A general object of the present invention is to produce a 

includes a new instruction for handling multiple multiply more efficient method for the execution of branch or loop 

and divide instructions. These instructions include passing instructions. 
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SUMMARY OF THE INVENTION respectively. When the value in the R3 field is odd, it 

. ^. . . . J ^ ^ designates a single register, the contents of which are used 

The invention resides in a processor and a dependent both the increment and the compare value, 

co-processor m a computer ^stem, wherem m one of the ^ typical program loop which uses the BXLE instruction 

processors, instruc^ons m the loop are processed while ^ ^^^^ ^e one in which die products of vectors are calcu- 

smiultaneously in the other processor, a loop end condition j^ted. Such a program could contain the following instruc- 
is processed. 

According to one feature of the invention* during the 
processing of the instructions in the loq), the index value for 

the branch instruction is calculated, the calculated index iq 
value is compared with a branch condition value, and if the 
branch condition value is equal to the calculated index value 
then a successful branch indicator latch is set. A set suc- 
cessful branch indicator latch indicates that the instructions 

in the loop are to be performed again. Otherwise the next 15 

instnaction outside the loop is performed. this excerpt from a typical program the value stored in a 

In a preferred embodiment of the invention, the register R2 is loaded into the register Rl as indicated by the 

co-processor is a floating point unit. Also, a first branch LDR instruction. This requires one cycle. The value in the 

address register stores the address of branch instruction and register Rl is multiplied by a value in the cache and stored 

a second branch address register stores the address of the 20 in the Rl register as given by the MD instruction. This 

target instruction to be carried out if the branch instruction requires another cycle. The value in register Rl is then added 

condition is fulfilled. First and second auxiliary registers to the value in the cache and placed in register Rl, as given 

store the register numbers of the branch microcode instruc- by the AD instraction. This requires one cycle. The value in 

tion and first and second latches control the calculation of the register Rl is then stored in the cache, as given by the 

the branch condition. 25 STD instruction. This requires a further cyde. Finally the 

BXLE instruction is executed As explained below in con- 

BRIEF DESCRIPTION OF THE HGURES nection with FIGS. 4a and 4b. this requires, in prior art 

HG. 1 illustrates tiie format of a prior art BXLE branch '^'^^ ^ branching operation is carried 

instruction. otherwise only two cycles. 

'I -11 *_ * • J . 1 . 30 2 shows the hardware used to generate the address 

HG. 2iUustrates a circuit used for calculating the address ^f the instruction to be executed. In address generator 100, 

of tiie next instruction to be executed. ^^^^ ^^^^^3 instruction in the program is 

FIG. 3 iUustrates data flow in the present invention and generated, e.g. by a branch instruction such as BXLE, and is 

the addressing mechanism of a general puipose register. passed to an instruction address register (lAR) 103 tiirough 

FIGS. 4a and 4b illustrates the execution of a BXLE 35 a multiplexer 101. The target address is passed to an 

branch instruction in the prior art. instruction buffer selector 112 which selects the addressed 

FIGS. 5a; 5b, and 5c illustrates the execution of a BXLE instruction out of the instruction buffer 110. The instruction 

branch instmction according to the present invention. buffer 110 is connected to a first operations register 120 and 

in turn to a second operations register 130. 

DETAILED DESCRIPTION OF THE 40 During execution of the instruction, the lAR modifier 107 

PREFERRED EMBODIMENTS OF THE addresses the next sequential instruction of the program The 

PRESENT INVENTION lAR modifier 107 is a network which allows the calculation 

f ' , . J M ^ . ,M of this instruction. The output of the lAR modifier is 

Refemng now to fte Figures in detaU wherein like ,„^ected to the instruction address register 103 through 

reference numaals mdicatc like dements throughout the „ ^nc 108 and the multiplexer 101. ^ 

Figures. FIG. 1 illustrates a BXLE (Branch on Index Low or n,. n..f™,t /^f h,. tad rjAO \« jn ia>» • 

s^isSofs?^^^ co^ctSrLsn^^^ 

^^rA^H^^^^^^^^^ Z ''TTr>^'^l oi i^P"t to the multiplexer 109 is comiected to 

f K ^ii"' • ^^^^ Rl and R3 , branch target re^ster 140 whidi stores the address of the 

ana a Drancn aadress D2. instruction in a program loop as wiU be described 

The BXLE mstruction is described in detail in the "Enter- below This address is generated from the field D2 of the 

pnse Systems Architecture/390 Principles of Operation" in-anch instruction 

(IBM PubUcations Number SA22.7201 available from The multiplexed 109 is controlled by a signal from an 

Mechamcsburg. Pa.). The instruction adds an increment to a AND gate 150. The AND gate 150 has three inputs. One 

first operand which is stored in a register whose address is 55 input is from a successful branch (SFB) condition latch SFB 

given m the field Rl and tiien compares the sum with a 240, another input is a successful loop condition signal SBC. 

compare value. The result of this comparison determines the third input comes from the output of an address com- 

whether or not branching occurs, Subsequentiy, the sum is parator 160. Tlie address comparator 160 compares the 

placed at the locaUon of the first operand (i.e. in tiic register address in the lAR modifier (lAR Mod) 107 with the address 

whose address is given in the field Rl). The address stored 60 in a branch address register 165 and produces a signal if the 

m field D2 is the address to which branching occurs. For two addresses are identical. 

BXLE. when the sum is low or equal to the compare value, FIG. 3 shows the data flow in one embodiment of the 
the next mstmcUon address in the current PSW is rq)laced present invention. The first operations register 120 is con- 
by the branch address in field D2. nected to a supplementary register 220 which stores and 
When the value ii the R3 field is even, it designates a pair 65 decodes the address of the registers given by the Rl and R3 
of registers; the contents of the even and odd registers of the fields of the BXLE instruction. The first operations register 
pair are used as the inaement and the compare value 120 is also connected to an address decoder 200. The address 
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decode 200 decodes the address in a genaal puipose addresses are passed through the multiplexers to the general 

register file 250 of the registers given by the Rl and R3 fields puipose register file 250 where a read operation is carried 

of the BXLE instruction. out (lines 430, 530). 

Both the address decoder 200 and supplementary register ^ cycle of operation the BXLE instruction is 

220 are connected to multiplexers 225 and 230 which are 5 passed firom the first operations register 120 to the second 

controlled by a signal from the successful branch (SFB) operations register 130 (lines 420, 520). However, the 

condition latch 240. The multiplexers 225 and 230 are so BXLE instruction also remains in the first operations register 

controlled that they pass the decoded addresses in the ^^O as it requkes at least two cycles for its completion. In 

address decoder 200 when the successfiil branch (SFB) ""^^^.^^Zl calculated for the next 

condition latch 240 is not set, otherwise they pass the lO instruction ONSI) to be earned out on c^^^ 

decodedad^essess^^^^ S:^;^^^^^^ 

The multiplexer 225 passes ^p^^ T^e contents of the registers in the general 

m the Rl field of ^e BJOB mstruction to the general purpose register 250 indicated by the addresses stored in the 

puipose register file 250. The value stored in this address is ^ f^^^^ ^^e^and and the inaement 

passed to an A register 260 of an arithmetic and logic unit i5 yajug, are read into the A register 260 and the B register 270 

280. The multiplexer 230 passes the value of the address (Unes 440, 540). In the arithmetic and logic unit 280 the 

stored in the R3 field of the BXLE instruction to the general contents of the A register 260 and B register 270 are added 

purpose register file 250. The value stored in this address is together (lines 460, 560). The value stored in the R3 field of 

passed to a B register 270 of the arithmetic and logic unit the BXLE field is also used to address in the general purpose 

280. 20 register file 250 the value in the odd register (lines 430, 530), 

Connected to the arithmetic and logic unit 280 is a D i.e. the coix^)are value, as explained above, 

register 290 which stores the output of the arithmetic and In the sixth cycle of operation, a different procedure is 

logic unit 280 and passes the ou^ut to either the A register carried out depending on whether the branch condition is 

260, the B register 270 and/or tiie general purpose register successful or not. In FIG. 4A, an exan^le of a successful 

file 250. Also connected to the output of the arithmetic and 25 branch condition is shown, Le. the loop operation continues, 

logic unit 280 is a branch decoder 300 which produces a In this case, the sum (S) of the first operand and increment 

successful branch condition SFB signal if the conditions for value, i.e. the value stored in the register indicated by the 

a branch instruction are met as will be described below. address in the Rl and in the even register indicated by the 

The processing of the BXLE instruction in the prior art R3 field is written into the general puipose register file 250 

will now be described with the help of FIGS. 4A and 4B. 30 (line 450) and also into the A register 260 (line 440). The 

Crosses in the Figs, indicate at which point in the cycle the contents of the odd register indicated by the address stored 

action is carried out Thick horizontal bars in the FIGS. 4A in the field R3 in the BXLE instruction is read into the B 

and 4B indicate during which periods a signal or data is register 270 (line 440). 

valid. In the arithmetic and logic unit 280, the contents of the 

In the first cycle of operation, the address of the instruc- 35 odd register indicated by the address in the field R3 of the 

tion A from the lAR modifier 107 is taken from the output BXLE instruction is subtracted from the sum of tiie values 

of the multiplexer 109 and is passed to the instruction buffer of the contents of the registers indicated by the values of the 

110 (lines 400, 500). field R3 and Rl (line 460). If tiiis value is positive or zero 

In the second cycle of operation, the address of the next then the branch condition is fulfilled, a successful hrandi 

instmction B to be executed is calculated and placed at the 40 condition signal (SFB) is issued (line 470) and die value of 

output of the multiplexer 109 (lines 400, 500). The new the address in the D2 field of the BXLE instruction is passed 

address is also put into the instruction address register (lAR) to the lAR modifier 107 to indicate the address of the target 

103. The instruction A is loaded into the first operations instruction (TGI) (line 400). 

register 120 from the instruction buffer 110 (lines 410, 510). If, however, as is shown in FIG. 4b, the value in the 

In the third cycle of operation, the address in the instruc- 4S arithmetic and logic unit 280 is negative, then the branch 

tion buffer 110 of the next instruction to be executed is condition is not fulfilled (line 560). No successful branch 

calculated and placed at the ou^ut of multiplexer 109 (lines condition signal is issued (indicated by dotted line; line 570). 

400, 500). In this example, this next instruction is the branch The next instruction (NSI) is passed into the first operations 

instruction BXLE. It could, of course, be any branch instruc- register 120 (line 510) where it is decoded and execution 

tion. The instruction A is passed from the first operations 50 begun. 

register 120, to the second operations register 130 (lines 420, Comparing FIGS. 4a and 4b, it is seen that from the point 

520) and the instruction B is passed from the instruction at which the BXLE instruction is passed into the first 

buffer 110 into the first operations register 120 (lines 410, operations register 120 until the point at which a next, 

510). non-branch instruction is passed to the first operations 

The instructions A and B in the first operations register 55 register requires either three cycles (FIG. 4a) if the branch 

120 and the second operations register 130 are decoded and condition is successfiil or two cydes (FIG. 4b) if it is not 

executed as is well known in the prior art. successful. The successfiil completion of the BXLE requires 

In the fourth cycle of operation, the BXLE instruction is a wait cycle in cycle 7 in the second operations register 130 

passed from the instruction buffer 110 into the first opera- as is shown by the no operation (NOP) instruction (line 420), 

tions register 120 where it is decoded and executed (lines go The processing of a branch instruction according to the 

410, 510). invention will now be described. Let us suppose now that the 

Decoding and execution of the BXLE instruction is program comprises a loop contaiiung a series of four instruc- 

carried out as known in the prior art The contents of the Rl tions Fl to F4 which are implemented In a floating point 

and R3 fields are passed to the address decoder 200 which unit, followed by a BXLE instmction. Such a set of instruc- 

decodes the addresses of the registers in the general purpose 65 tions are typical of a vector operation. Of course, the loop 

register file 250 in which tfie values of the first operand, could in real life be much longer than this. The loop is 

inaement value and compare value are stored. The decoded executed n times. 
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FIG. Sa shows the first pass of the loop. As was described 
in conaection with FIGS. 4a and 4b, lAR modifier 107 
calculates the address in the instruction buffer 110 in which 
the instructions are to be found (line 600). The instructions 
are passed from the instruction buffer 110 to the first 
operations register 120 to the second operations register 130 
in successive cycles as is shown in lines 605 and 610. For 
simplicity only the instructions Fl to F4 and BXLE are 
shown in the FIGS. 5a to 5c. On FIGS. 5a to Sc, the lines 
which correspond to one another are given the same number. 

In the sixth cycle of the first loop (FIG. 5a), the BXLE 
instruction is passed to the first operations register 120 (line 
605). in the seventh cycle it is passed to the second opera- 
tions register 130 (line 610) and in the eighth cycle the 
branch condition is evaluated (line 665) as was described in 
connection witii FIGS. 4a and 4b. In the first loop, the 
branch condition is successful and a successful branch 
condition signal (SFB) issued (line 630) which, in the 
subsequent cycle, sets the successful branch (SFB) latch 
240. The successful branch latch 240 remains set until it is 
reset as will be described later in connection with FIG. 5c. 

The successful branch condition signal (SFB) will also 
store the target address to which is branched (i.e. the address 
indicated in the D2 field of the BXLE instruction) in the 
branch target register 140. In addition the address of the 
registers given in the Rl and R3 fields of the BXLE 
instruction are stored in the supplementary register 220. The 
address of the branch instruction itself will be stored in the 
branch address register 165. 

The successful branch condition signal (SFB) also sets in 
the ninth cycle the branch Al latch (line 640) which in turn 
sets, in the tenth cycle, the branch A2 latch (line 645). The 
branch Al latch and the branch A2 latch remain set for only 
one cycle each as is shown in FIGS. 5a-5c. 

FIG. Sb illustrates the further passes, from pass 2 to pass 
n-1 (i.e. penultimate pass) in the loop. In these passes, the 
floating point instructions Fl to F4 are passed directiy from 
the first operations register 120 and second operations 
register 130 to the floating point unit where they are 
executed (lines 600 to 610). 

Parallel to the execution of the floating point instructions 
Fl to F4 in the floating point unit, the branch condition is 
calculated in the fixed point unit as will be now described. 

As shown in third cycle of FIG. 5b and described above, 
the successful branch condition signal sets the branch Al 
latch (line 640). This is an equivalent step to the ninth cycle 
of FIG. 5a. The brandi Al latch causes the values from 
registers indicated in Rl and R3 which are stored in the 
supplementary register 220 to be read from the general 
purpose register file 250 (line 655) into the A register 260 or 
the B register 270. This is controlled by tiie multiplexers 225 
and 230 which are switched by a signal from the successful 
branch latch 240. The successful branch condition signal 
(SFB) sets a branch B2 latch (line 650). 

In the fourth cycle of FIG. 5b, tiie sum (S) of the first 
operand stored in the general purpose register file 250 at the 
address given in the Rl field of the BXLE instruction and the 
inaement value stored in the general purpose register file 
250 at the address given in the R3 field of the BXLE 
instruction is calculated in the arithmetic and logic unit 280 
(line 665) and it is then written back into either the A register 
260 or the B register 270. The value in the general purpose 
register 250 at the address given by the odd part of R3 of tiie 
BXLE command is read into the other one of the A register 
260 or the B register 270. The address is obtained from the 
supplementary 220 and is controlled by the multiplexer 221 
whidi is triggered when the branch A2 latch is active (line 
645). 
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In the fifth cyde of FIG. 5b, the branch condition is again 
checked (line 665) and, as long as it is still fulfilled, the 
successful branch latch 240 will remain set (line 625). 
The end of the loop is determined by looking at the 

5 address of the instruction in the lAR modifier 107 and seeing 
whether this is the same as the address of the last instruction 
in the loop, i.e. the address of the BXLE instruction. As was 
noted above, the address of the last instruction in the loop is 
stored in the branch address register 165. The comparison is 

jQ made in the address comparator 160 shown in FIG. 2. If the 
two addresses instructions are equal, then a signal is pro- 
duced (line 675) and instead of the address of the BXLE 
instruction being passed to the instruction buffer so that the 
BXLE instruction is read into the first operation register 120, 

^5 the address of the first instruction in the loop Fl is read out 
of the branch target register 140 and passed to the instruction 
buffer. The multiplexer 109 is controlled by a signal from the 
AND gate 150 as described above. 
The successful branch condition signal (SFB) also sets the 

20 branch B2 latch. This latch controls tiie writing of the 
calculated sum (S) of the first operand and tiie increment 
value into the general purpose register file 250 at the address 
indicated by Rl of the BXLE instruction. 
FIG. Sc illustrates the final progression through the loop 

25 of the floating point instructions Fl through F4. The calcu- 
lations in the first to fourth cycles of FIG. 5c proceed as 
described above. In the fifth cycle, the branch condition is 
not fulfilled (line 665). In this case, the successful branch 
condition signal (SFB) is not issued (line 630) and, as a 

3Q result, tiie successful branch condition latch 240 is not set 
(line 625). Dotted lines on FIG. 5c indicate absence of the 
signal. As a result of this, no signal will be issued from the 
AND gate latch 150 to the multiplexer 109 and tiie branch 
instruction BXLE will be passed from the instruction buffer 

35 110 to the first operations register 120 (line 605). The branch 
condition will then be evaluated in tiie prior art fashion and, 
as it is not fulfilled, the next instruction NSI read from the 
lAR modifier 107 into the first operations register 120 from 
the instruction buffer and executed as was described above. 

Usingithe-apparatus^f.^the,CTOent^inventionronc can 
reducejhe^^ective.time^required^fOT-the^ev^ of the 
branch condition from-tfarecjicycles-to-zero cycles. This is 
achieved by perf<OTning_the_calculation^felevant to the 
branch condition-in-the fixed poinTunit whilst the floating 

45 point unit calculations.are.performed in the floating point 

Based on the foregoing, apparatus and method for execut- 
ing a branch or other loop instruction have been disclosed. 
However, numo-ous modifications and substimtions can be 
5Q made witiiout deviating from the scope of the present 
invention. Therefore, the invention has been disclosed by 
way of illustration and not limitation, and reference should 
be made to the following claims to determine the scope of 
the present invention. 
55 We claim: 

1. Method for executing a program comprising instruc- 
tions in a loop, said method con^rising the steps of: 
in a first processor, performing a first execution of said 
instructions and concurrentiy in a second processor, 
go processing a loop end condition; 

storing a branch target address in a branch target register 
and storing an address of a branch instruction in a 
branch address register, said branch target address 
being an address of a first instruction in tiie loop; 
65 performing a subsequent execution of said instructions in 
said first processor, and while avoiding execution of 
said branch instruction, evaluation instead a branch 
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conditioii in said first processor and, if the branch 
condition is fulfilled, detecting an end of the loop by 
comparing an effective address of a next instruction to 
be executed with contents of the stored address of the 
branch instruction; 
passing the stored branch target address firom said branch 
target registw to said second processor for re-executing 
said instructions; and 
executing said instructions until the branch condition is no 
longer fulfilled, and then addressing a next instruction 
outside of said loop; and wherein 
said evaluating step comprises the steps of calculating an 
index value for the branch instruction, comparing Ae 
calculated index value with a branch condition value, 
and setting a successful branch indicator latch if the 
branch condition value is equal to the calculated index 
value; and said calculating step comprises tiie step of 
retrieving from an index register a previous value of the 
index value and adding to said previous value a value 
given in the branch instruction. 
2. A method for executing a program utilizing a main 
processor, a co-processor, a general purpose register and a 
loop control means for processing loops of instructions 
including a branch instruction which specifies a loop index, 
an increment and a branch address and determines whether 
ttie loop has to be repeated or exited, said method con^s- 
ing the steps of: 
said main processor generating current addresses of said 
instructions and said co-processor concurrently execut- 
ing said instructions; 
separately storing address data of said index and said 

increment during a loop execution; 
storing said branch address and an address of the branch 
instruction when during processing of a loop said 
branch instruction is first time executed; 
storing a branch condition if during a first execution of the 
loop the branch instruction indicated that a branch 
condition is fulfilled, and maintaining said branch 
condition stored during all loop execution cycles untQ 
said loop control means indicates that the branch con- 
dition is no longer fulfilled; 
updating said index while said loop is executed a next 
time in said co-processor, said generating at an end of 
said next loop execution a successful branch condition 
indication if the branch condition is still fiilfilled, said 
successful branch condition indication confirming stor- 
age of said brand! condition for a next loop cyde; 
repeating step (e) as long as the brandi condition is 
fulfiUed; 

comparing said branch instruction address and said cur- 
rent instruction address to determine when during 
execution of the loop said branch address has been 
reached; and 

repladng said current instruction address by said branch 
address if step (g) resulted in a match and step (e) or (0 
resulted in a branch condition, thaeby suppressing the 
execution of said branch instruction until the branch 
condition ceases to be fulfilled; 
calculating an index value for die branch instruction; 
comparing the calculated index value with a branch 

condition value; 
setting a successful branch indicator latch if the branch 
condition value is equal to the calculated index value; 



retrieving from an index register a previous value of the 
index value and adding to the previous value a value 
given in the branch instruction; and 
wherdn said branch instruction is a branch or index low 
or equal instruction. 

3. A con4>uter system including main processor for per- 
fonning address modification, a co-processor for executing 
loop instructions and a general purpose register, said system 
comprising: 

loop control means for processing loops of instructions 
including a branch instruction which specifies a loop 
index, an increment and a branch address and deter- 
mines whether the loop has to be repeated or exited; 
instruction address modifier means for generating a cur- 
rent instruction address during execution of a loop; 
first register means for storing address data of said index 

and said inaement during the loop execution; 
second register means for storing said brandi address and 
an address of die branch instruction when during a loop 
processing said branch instruction is first time 
executed; 

branch latch means which are set if during a first execu- 
tion of the loop the brandi instruction determined that 
the branch condition is fulfilled, where said branch 
latch means remains set during all loop execution 
cydes until said loop control means indicates that the 
branch condition is no longer fulfilled; 
means for updating said index while the loop is executed 
in the co-processor, and generating at an end of a loop 
execution a successful branch condition signal if the 
branch condition is still fulfilled, said successful branch 
condition signal confirms the set status of said branch 
latch means; 

means for comparing said branch instruction address and 
said current instruction address to indicate that during 
execution of the loop said branch address has been 
reached; and 

multiplexer means controlled by a match output of said 
comparing means, by the set status of said latch means 
and by said successful branch condition signal for 
replacing said current instruction address by said 
branch target address from said second register means 
and thereby suppressing the execution of said branch 
instruction until the branch condition ceases to be 
fulfilled; and 

wherdn said first register means comprises a supplemen- 
tary register for storing numbers of registers for said 
branch instruction, one of said registers containing an 
index and the other of said registers containing an 
increment. 

4. The computer system according to claim 3» wherein 
said means for updating said index includes means for 

55 calculating an index value for the branch instruction and 
means for comparing the calculated index value with a 
branch condition value, wherein said comparing means 
controls the setting of said branch latch means if the branch 
condition is fulfilled. 

5. The computer system according to daim 3, further 
comprising second and third latch means for controlling the 
calculation of the branch condition and the operation of said 
conq)aring means. 

6. A method for processing loops of instructions including 
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executing again the loop instructions if the successful 65 a branch instruction whidi specifies a loop index, an incre- 
branch indicator latdi is set, otherwise jumping to a mcnt and a branch instruction address, said method corn- 
next instruction outside the loop to be executed; and prising the steps of: 
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(a) in a first processor, generating a current instniction 
address during execution of a loop; 

(b) separately storing address data of the index and the 
increment during execution of said loop; 

(c) storing the branch instruction address and a branch 
target address when during loop processing said branch 
instruction is executed a first time; 

(d) storing a branch condition if during a first execution of 
the loop the branch instruction indicates that a branch 
condition is fulfilled, and maintaining said stored 
branch condition during other executiott(s) of said loop 
until the branch condition is no longa fiilfiiled; 

(e) in a second processor, executing said loop a next time 
and concurrently modifying the current instruction 
address and evaluating the branch condition in said first 
processor by calculating an index value for the branch 
instruction using said separately stored address data 
and said separately stored inaement and by comparing 
the calculated index value with a value of the branch 
condition and generating a successful branch condition 
indication if the branch condition value is equal to the 
calculated index value; 

(f) repeating step (e) as long as the branch condition is 
fulfilled, and otherwise addressing and executing in 
said first processor a next instruction outside of the 
loop; 

(g) during each performance of step (e). accessing the 
stored branch instruction address and comparing it with 
the modified current instruction address to d^ermine 
when during execution of the loop said branch instruc- 
tion address has been reached; and 

(h) replacing said current instruction address by said 
branch target address if step (g) resulted in a match and 
step (e) resulted in a successful branch condition, 
thereby suppressing execution of said branch instruc- 
tion in said first processor until the branch condition 
ceases to be fulfilled 

7. A method according to claim 6 wherein said loop 
instruction controls mathematical operations. 

8. A method according to claim 6 wherein said second 
processor is a floating point unit 

9. A method according to claim 6 wherein said branch 
instruction is a branch on index low or equal instruction. 

10. A method according to claim 6 wherein step (e) 
comprises the steps of: 

calculating an index value for the branch instruction by 
retrieving firom an index register a previous value of the 
index value and adding to said previous value a value 
given in the branch instruction; 

conq)aring the calculated index value with a branch 
condition value; and 

setting a successful branch indicator latch if the branch 
condition value is equal to the calculated index value. 

11. A first processor for processing loops of instructions 
including a branch instruction which species a loop index, 
an inaement and a branch instruction address, said first 
processor comprising: 

means for generating a current instruction address during 

execution of a loop; 
first register means for separately storing address data of 

the index and said increment during execution of said 

loop; 

second register means for storing the branch instruction 
address and a branch target address when during loop 
processing said branch instruction is executed a first 
time; 



latch means for storing a branch condition if during a first 
execution of the loop the branch instruction indicates 
that the branch condition is fulfilled, and maintaining 
said stored branch condition during other execution(s) 
of said loop until said loop control means indicates that 
the branch condition is no longer fulfilled; 
means for modifying the current instruction address and 
evaluating the branch condition while said loop is 
executed a next time in a second processor, by calcu- 
lating an index value for the branch instruction using 
the contents of said first register means, said separately 
stored address data and said separately stored incre- 
ment and by comparing the calculated index value with 
a value of the branch condition and generating a 
successful branch condition indication if tiie branch 
condition value is equal to the calculated index value; 
means for repeating operation of the modifying and 
evaluating means as long as the branch condition is 
fulfilled, and otherwise addressing and executing in 
said first processor a next instruction outside of tiie 
loop; 

means for comparing during each operation of the modi- 
fying and evaluating means, said branch instruction 
address stored in said second register means and the 
modified current instruction address to determine when 
during execution of the loop said branch instruction 
address has been reached; and 
means for replacing said modified current instruction 
address in said modifying means by said branch target 
address if the comparing means indicates a match and 
the operation of the modifying and evaluating means 
resulted in a branch condition, thereby suppressing 
execution of said branch instruction in the first proces- 
sor until the branch condition ceases to be fulfilled. 

12. The processor according to claim 11, wherein said first 
register means comprises supplementary registers storing 
the register nimibers of said branch instruction, one of said 
supplementary registers containing said index and the other 
of said supplementary registers containing said increment. 

13. The processor according to claim 11, wherein said 
means for modifying the instruction address and for evalu- 
ating the branch condition includes means for calculating an 
index value for the branch instruction and means for com- 
paring die calculated index value with a value of tiie branch 
condition, wherein said comparing means directs the setting 
of said branch latch means if the branch condition is 
fulfiUed. 

14. The processor according to claim 11, wherein said 
^ latch means comprises second and third latches to control 

the calculation of the branch condition and the operation of 
said comparing means. 

15. Method for executing a program comprising instruc- 
tions in a loop, said method comprising the steps of: 

in a first processor, performing a first execution of said 
loop instructions and concurrently in a second 
processor, processing a loop end condition; 
storing a branch target address of a Gxst instruction in the 

loop and storing an address of a branch instniction; 
performing a subsequent execution of said loop instruc- 
tions in said first processor, and while avoiding execu- 
tion of said branch instruction, evaluating instead a 
branch condition in said first processor and, if the 
branch condition is fulfilled, detecting an end of said 
loop by comparing an effective address of a next 
instruction to be executed with the stored address of the 
branch instruction; 
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passing the stored branch target address to said second 
processor for re-executing said instructions; and 

executing said loop instructions until the branch condition 
is no longer fulfilled, and then addressing a next 
instruction outside of said loop; and wherein ^ 

said evaluating step comprises die steps of calculating an 
index value for the branch instruction, con^aring the 
calculated index value with a branch condition value, 



14 

and setting a successful branch indicator latch if the 
branch condition value is equal to the calculated index 
value; and 

said calculating step comprises the step of retrieving a 
previous value of the index value and adding to said 
previous value a value given in the branch instruction. 
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